Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

retryOnConflict shouldnt' retry on NotFound #3192

Merged
merged 2 commits into from
Feb 9, 2025
Merged

Conversation

haouc
Copy link
Contributor

@haouc haouc commented Feb 6, 2025

What type of PR is this?

Which issue does this PR fix?:

The issue is not always reproduceable. We had to work with our customer to reprod in their environment.

What does this PR do / Why do we need it?:
When a pod was deleted, the cni plugin call ipamd to get the del reply. However in some case, the grpc call had a rely as

{"level":"info","ts":"2025-01-17T19:53:11.581Z","caller":"routed-eni-cni-plugin/cni.go:314","msg":"Received CNI del request: ContainerID(8b165f00e3b140f3e5606324bf8e4f03da5e9ef88c63ecf74b9789df201048bb) Netns(/var/run/netns/cni-31295822-c477-03eb-5eea-eeb71cf4ecb0) IfName(eth0) Args(K8S_POD_UID=4554f789-259a-4fa2-ba49-03c5439d4066;IgnoreUnknown=1;K8S_POD_NAMESPACE=xxx;K8S_POD_NAME=xxx;K8S_POD_INFRA_CONTAINER_ID=8b165f00e3b140f3e5606324bf8e4f03da5e9ef88c63ecf74b9789df201048bb) Path(/opt/cni/bin) argsStdinData({\"cniVersion\":\"0.4.0\",\"mtu\":\"9001\",\"name\":\"aws-cni\",\"pluginLogFile\":\"/var/log/aws-routed-eni/plugin.log\",\"pluginLogLevel\":\"DEBUG\",\"podSGEnforcingMode\":\"standard\",\"prevResult\":{\"cniVersion\":\"0.4.0\",\"interfaces\":[{\"name\":\"eni2621c471e82\"},{\"name\":\"eth0\",\"sandbox\":\"/var/run/netns/cni-31295822-c477-03eb-5eea-eeb71cf4ecb0\"},{\"name\":\"dummy2621c471e82\",\"mac\":\"0\",\"sandbox\":\"1\"}],\"ips\":[{\"version\":\"4\",\"interface\":1,\"address\":\"10.110.131.105/32\"}],\"dns\":{}},\"type\":\"aws-cni\",\"vethPrefix\":\"eni\"})"}
{"level":"error","ts":"2025-01-17T19:53:11.583Z","caller":"routed-eni-cni-plugin/cni.go:314","msg":"Error received from DelNetwork gRPC call for container 8b165f00e3b140f3e5606324bf8e4f03da5e9ef88c63ecf74b9789df201048bb: rpc error: code = Unknown desc = error while trying to retrieve pod info: Pod \"xxx\" not found"}
{"level":"info","ts":"2025-01-17T19:53:11.583Z","caller":"routed-eni-cni-plugin/cni.go:314","msg":"Could not teardown pod using prevResult: ContainerID(8b165f00e3b140f3e5606324bf8e4f03da5e9ef88c63ecf74b9789df201048bb) Netns(/var/run/netns/cni-31295822-c477-03eb-5eea-eeb71cf4ecb0) IfName(eth0) PodNamespace(xxx) PodName(xxx)"}

if we check the ipamd log, we can see

2025-01-17T19:53:11.582928247Z stdout F {"level":"error","ts":"2025-01-17T19:53:11.582Z","caller":"rpc/rpc.pb.go:881","msg":"Failed to delete the pod annotation: error while trying to retrieve pod info: Pod \"xxx\" not found"}
2025-01-17T19:53:11.582931182Z stdout F {"level":"info","ts":"2025-01-17T19:53:11.582Z","caller":"rpc/rpc.pb.go:881","msg":"Send DelNetworkReply: IPv4Addr: 10.110.131.105, IPv6Addr: , DeviceNumber: 1, err: error while trying to retrieve pod info: Pod \"xxx\" not found"}

This error should be returned from https://github.com/aws/amazon-vpc-cni-k8s/blob/master/pkg/ipamd/ipamd.go#L1987

Replying an error could lead to plugin skip cleaning up pods' IP rules. If the pods are gone, no need to skip the cleanup.

Testing done on this change:

Will this PR introduce any new dependencies?:

Will this break upgrades or downgrades? Has updating a running cluster been tested?:

Does this change require updates to the CNI daemonset config files to work?:

Does this PR introduce any user-facing change?:


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@haouc haouc requested a review from a team as a code owner February 6, 2025 05:44
@@ -2001,6 +2001,11 @@ func (c *IPAMContext) AnnotatePod(podName string, podNamespace string, key strin
if err == nil && pod == nil {
log.Warnf("get a nil pod for pod name %s and namespace %s", podName, podNamespace)
}
// since the GetPod() error has been decorated, we have to check key words
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would improve the comment to something like this to be more clear. But we can do as a follow up.

releasedIP is only set in the DelNetwork RPC call. We don’t want to error out on removing PodAnnotation when Pod is deleted and IP is being released.

Copy link
Member

@orsenthil orsenthil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@orsenthil orsenthil merged commit 5b69f3e into aws:master Feb 9, 2025
6 checks passed
orsenthil added a commit that referenced this pull request Feb 10, 2025
Co-authored-by: Senthil Kumaran <senthilx@amazon.com>
orsenthil added a commit that referenced this pull request Feb 19, 2025
* Update to Changelog, config and scripts. (#3095) (#3107)

* Update to Changelog, config and scripts.

* Add Version in Changelog.

Co-authored-by: Senthil Kumaran <senthilx@amazon.com>

* Update NP strict mode doc (#3125)

* adding email to send log bundle  (#3134)

* Fix issues handling unmanaged ENIs with IPv6 only (#3122)

* Bump go.uber.org/zap from 1.26.0 to 1.27.0

Bumps [go.uber.org/zap](https://github.com/uber-go/zap) from 1.26.0 to 1.27.0.
- [Release notes](https://github.com/uber-go/zap/releases)
- [Changelog](https://github.com/uber-go/zap/blob/master/CHANGELOG.md)
- [Commits](uber-go/zap@v1.26.0...v1.27.0)

---
updated-dependencies:
- dependency-name: go.uber.org/zap
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Bump github.com/stretchr/testify from 1.9.0 to 1.10.0

Bumps [github.com/stretchr/testify](https://github.com/stretchr/testify) from 1.9.0 to 1.10.0.
- [Release notes](https://github.com/stretchr/testify/releases)
- [Commits](stretchr/testify@v1.9.0...v1.10.0)

---
updated-dependencies:
- dependency-name: github.com/stretchr/testify
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Bump github.com/onsi/gomega from 1.35.1 to 1.36.0

Bumps [github.com/onsi/gomega](https://github.com/onsi/gomega) from 1.35.1 to 1.36.0.
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](onsi/gomega@v1.35.1...v1.36.0)

---
updated-dependencies:
- dependency-name: github.com/onsi/gomega
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Bump github.com/prometheus/common from 0.60.0 to 0.60.1

Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.60.0 to 0.60.1.
- [Release notes](https://github.com/prometheus/common/releases)
- [Changelog](https://github.com/prometheus/common/blob/main/RELEASE.md)
- [Commits](prometheus/common@v0.60.0...v0.60.1)

---
updated-dependencies:
- dependency-name: github.com/prometheus/common
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update changelog from release-1.19 branch to master branch. (#3136)

* Update to Changelog, config and scripts. (#3095) (#3107) (#3108)

* Update to Changelog, config and scripts.

* Add Version in Changelog.

Co-authored-by: Senthil Kumaran <senthilx@amazon.com>

* Updating Manifest, Changelog and scripts (#3115)

* Update to Changelog, config and scripts. (#3095) (#3107) (#3118)

* Update to Changelog, config and scripts.
* Add Version in Changelog.

Co-authored-by: Senthil Kumaran <senthilx@amazon.com>

* fixed the changelog.

---------

Co-authored-by: Jay Deokar <23660509+jaydeokar@users.noreply.github.com>

* Bump github.com/onsi/ginkgo/v2 from 2.20.1 to 2.22.0

Bumps [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) from 2.20.1 to 2.22.0.
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.20.1...v2.22.0)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Bump golang.org/x/sys from 0.26.0 to 0.27.0 in /test/agent

Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.26.0 to 0.27.0.
- [Commits](golang/sys@v0.26.0...v0.27.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Bump golang.org/x/sys from 0.27.0 to 0.28.0 in /test/agent

Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.27.0 to 0.28.0.
- [Commits](golang/sys@v0.27.0...v0.28.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Fix KOps Integration Test (#3140)

* scripts lib integration: add more logging steps

* scripts lib cluster: increase kops control plane node size

* run make generate-limits to update the max pods file (#3141)

* Update AWS VPC CNI to SDK V2 Update - master branch (#3070)

* Update AWS SDK to Version 2 and Remove V1 Dependency. Fixes #3116

* Handle EKS Service for the Beta Endpoint. (#3143)

* Adding multus v4.1.4 manifest (#3154)

* scripts integration: capture exit codes from both tests (#3149)

* fix(test): add volume mount for docker-func-test target (#3160)

Signed-off-by: Omer Aplatony <omerap12@gmail.com>

* cni-metrics-helper metrics: do type assertion before type casting (#3152)

* cni-metrics-helper metrics: do type assertion before type casting

* utils prometheusmetrics: remove counters from cni metrics mapping func

* Bump helm.sh/helm/v3 from 3.15.2 to 3.16.4

Bumps [helm.sh/helm/v3](https://github.com/helm/helm) from 3.15.2 to 3.16.4.
- [Release notes](https://github.com/helm/helm/releases)
- [Commits](helm/helm@v3.15.2...v3.16.4)

---
updated-dependencies:
- dependency-name: helm.sh/helm/v3
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Bump github.com/aws/aws-sdk-go-v2/service/autoscaling

Bumps [github.com/aws/aws-sdk-go-v2/service/autoscaling](https://github.com/aws/aws-sdk-go-v2) from 1.50.0 to 1.51.2.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Commits](aws/aws-sdk-go-v2@service/s3/v1.50.0...service/s3/v1.51.2)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/service/autoscaling
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Bump github.com/aws/aws-sdk-go-v2/service/iam from 1.38.1 to 1.38.3

Bumps [github.com/aws/aws-sdk-go-v2/service/iam](https://github.com/aws/aws-sdk-go-v2) from 1.38.1 to 1.38.3.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Commits](aws/aws-sdk-go-v2@service/s3/v1.38.1...service/s3/v1.38.3)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/service/iam
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update Changelog and Version for CNI 1.19.2 (#3171)

* Bump github.com/aws/aws-sdk-go-v2/feature/ec2/imds (#3166)

Bumps [github.com/aws/aws-sdk-go-v2/feature/ec2/imds](https://github.com/aws/aws-sdk-go-v2) from 1.16.19 to 1.16.22.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Commits](aws/aws-sdk-go-v2@service/ram/v1.16.19...service/ram/v1.16.22)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/feature/ec2/imds
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add CNINode to cache filter (#3164)

We should reduce the number of CNINode object VPC CNI watches for to
just the node it is managing as well.

Signed-off-by: Davanum Srinivas <davanum@gmail.com>
Co-authored-by: Hao Zhou <zhuhz@amazon.com>
Co-authored-by: Harish Kuna <hakuna@amazon.com>

* fix: remove null creationTimestamp from CRD metadata (#3163)

Signed-off-by: Omer Aplatony <omerap12@gmail.com>
Co-authored-by: Senthil Kumaran <senthilx@amazon.com>

* Fix issue with primary ENI ip lookup when an ENI has both IPv4 and IPv6 address. (#3156)

* Use awshttp client instead of smithy httpclient. (#3193)

* Use awshttp client.

* Update .go-version.

* retryOnConflict shouldnt' retry on NotFound (#3192)

Co-authored-by: Senthil Kumaran <senthilx@amazon.com>

* Update awsutils.go (#3191)

Updated typo for AssignPrivateIpv6Addresses to AssignIpv6Addresses

Co-authored-by: Senthil Kumaran <senthilx@amazon.com>

* Bump github.com/aws/aws-sdk-go-v2/service/cloudwatch

Bumps [github.com/aws/aws-sdk-go-v2/service/cloudwatch](https://github.com/aws/aws-sdk-go-v2) from 1.43.0 to 1.43.12.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Changelog](https://github.com/aws/aws-sdk-go-v2/blob/main/changelog-template.json)
- [Commits](aws/aws-sdk-go-v2@service/s3/v1.43.0...service/cloudwatch/v1.43.12)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/service/cloudwatch
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* Bump github.com/aws/aws-sdk-go-v2/service/autoscaling

Bumps [github.com/aws/aws-sdk-go-v2/service/autoscaling](https://github.com/aws/aws-sdk-go-v2) from 1.51.2 to 1.51.10.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Changelog](https://github.com/aws/aws-sdk-go-v2/blob/main/changelog-template.json)
- [Commits](aws/aws-sdk-go-v2@service/s3/v1.51.2...service/autoscaling/v1.51.10)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/service/autoscaling
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* Bump github.com/prometheus/common from 0.60.1 to 0.62.0

Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.60.1 to 0.62.0.
- [Release notes](https://github.com/prometheus/common/releases)
- [Changelog](https://github.com/prometheus/common/blob/main/RELEASE.md)
- [Commits](prometheus/common@v0.60.1...v0.62.0)

---
updated-dependencies:
- dependency-name: github.com/prometheus/common
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Bump golang.org/x/sys from 0.28.0 to 0.29.0 in /test/agent

Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.28.0 to 0.29.0.
- [Commits](golang/sys@v0.28.0...v0.29.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Bump golang.org/x/sys from 0.29.0 to 0.30.0 in /test/agent (#3198)

* Bump github.com/aws/aws-sdk-go-v2/service/cloudwatch (#3199)

* Bump github.com/aws/aws-sdk-go-v2/service/autoscaling

Bumps [github.com/aws/aws-sdk-go-v2/service/autoscaling](https://github.com/aws/aws-sdk-go-v2) from 1.51.10 to 1.51.12.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Changelog](https://github.com/aws/aws-sdk-go-v2/blob/main/changelog-template.json)
- [Commits](aws/aws-sdk-go-v2@service/fsx/v1.51.10...service/autoscaling/v1.51.12)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/service/autoscaling
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* Bump github.com/samber/lo from 1.39.0 to 1.49.1 (#3184)

* Bump github.com/aws/aws-sdk-go-v2/service/eks from 1.52.1 to 1.58.0 (#3200)

* Add grpc call to fetch networkpolicymode from NP (#3202)

* add rpc call to fetch np mode

* go generate

* nit: change print %t to %v

* Bug Fix: "utils prometheusmetrics: convert gauges to counters (#3093)""

This reverts commit e9af9f3 which
removed it in CNI 1.19.2 with fix in master.

* Fix issues handling unmanaged ENIs with IPv6 only (#3122)

This reverts commit 0a200d6 which
reverted only in CNI 1.19.2 with fix in master.

* Changes to attach probes at pod start

* minor error change

* do not ret error on grpc dial

* add dial with context

* update mocked grpc wrapper and unit tests

add new lines to satisfy format check

update unit tests for DialContext

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Omer Aplatony <omerap12@gmail.com>
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
Co-authored-by: Jay Deokar <23660509+jaydeokar@users.noreply.github.com>
Co-authored-by: pavanipt <pavanip2201@gmail.com>
Co-authored-by: Yash Thakkar <ythakkar97@gmail.com>
Co-authored-by: Gavin Bunney <409207+gavinbunney@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Shehbaj Dhillon <dshehbaj@amazon.com>
Co-authored-by: Todd Neal <tnealt@amazon.com>
Co-authored-by: Omer Aplatony <omerap12@gmail.com>
Co-authored-by: Davanum Srinivas <davanum@gmail.com>
Co-authored-by: Hao Zhou <zhuhz@amazon.com>
Co-authored-by: Harish Kuna <hakuna@amazon.com>
Co-authored-by: Hao Zhou <haouc@users.noreply.github.com>
Co-authored-by: Parikshit Patel <parixitpatel@gmail.com>
Co-authored-by: Pavani Panakanti <pavanipt@amazon.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants